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(54) Document creation aid 

(57) A document creation aid 1 is provided which can automatically generate various types of layouts with attribute boxes 
from documents whose layouts serve as models for the layouts. The aid defines regions which are different from one 
another in accordance with attrtoutes (e.g. text tables, pictorial) of images included in the document and processes each of 
the regions differently. The aid includes an image input section 2 for reading an Image of the document serving as a model. 
A layout analysis section 3 detects attributes of the images read from the document and identifies layout data indicating the 
size, position, and attribute of each of the regions based on differences in the Judged attributes. A layout generating 
instruction section 4 sets the same regions as those of the document serving as a model for a document to be created 
based on the identified layout data. 
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At least one drawing originally filed was informal and the print reproduced here is taken from a later filed formal copy. 
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DOCUMENT CREATION AID. 

The present invention relates to a document creation aid. 
Such an aid may be suitable for use with word processors 
and workstations. The document creation aid can be used 
for electronically producing document layouts of a 
complicated pattern in which heterogeneous data such as 
characters, graphics, and images are mingled. 

In creating documents using document creating devices 
such as word processors and workstations, it is sometimes 
desired that different types of data are included within 
a document, e.g. characters; graphics represented by 
vector data such as straight lines and curved lines; and 
bit map images specified by bit patterns such as images 
read by image readers. To create documents having 
heterogeneous data including characters, vector-expressed 
graphics, and bit map images, the data processing mode 
differs from one type of data to another within the 
document creating device. To improve processing 
efficiency under such circumstances, boxes such as a 
character box, a graphic box, and an image box are 
defined in a document and, when creating or editing the 
document, the attributes of each processing box are 
judged and the box is processed in accordance with the 
judged attributes. Throughout this specification, 
according to the context, the term "image" is to be 
broadly construed as including characters, vector- 
expressed graphics, bit map images, and the like, and 
narrowly construed as indicating only bit map images. 

In order to create a document having a mixture of 
characters, vector-represented graphics, bit map images, 
and the like using the conventional document creating 
devices, the document creator has to specify the boxes 



corresponding to the attributes in the document and 
adjust their sizes and positions. 

However, such box specification and adjustment operations 
must be performed for each type of box, i.e. the 
character box, the graphic box, and the image box. In 
addition, when there exists a plurality of boxes defined 
by the same attribute, the same operation must be 
repeated. This has made the conventional document 
creating operation cumbersome and time-consuming. 
Particularly, when an analogous document exists and it is 
desired that the same layout as that of the analogous 
document be used, the above input operations must be 
followed one by one, requiring the user to spend time 
recreating the layout of the document. 

One technique to reduce layout preparation time is to 
prepare a plurality of types of documents, each having a 
standard layout, in advance and to search and copy a 
document having the desired layout, and enter the desired 
data into that document whenever necessary. 

With this technique, however, all the standard layouts 
must be pre-stored, requiring a labour intensive initial 
set up operation. In addition, documents with all the 
standard layouts must be stored, which demands an 
enormously large storage capacity. Further, to use the 
stored layouts, the operation of searching for the target 
layout must always be involved. If the number of stored 
layouts is large, such searching operation is not an easy 
job. 

According to a first aspect of the invention, there is 
provided a document creation aid as defined in the 
appended Claim 1. 
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According to a second aspect of the invention, there is 
provided a document creation aid and as defined in the 
appended Claim 2. 

According to a third aspect of the invention, there is 
provided a method as defined in the appended Claim 4. 

Preferred embodiments of the invention are defined in the 
other appended claims. 

It is thus possible to provide a document creation aid 
which can generate various types of layouts with 
attribute boxes automatically when there are documents 
whose layouts serve as models for the layouts of 
documents to be created, thereby improving document 
creation efficiency. 

In a preferred embodiment, a document serving as a model 
is read and the regions in the read image are identified 
by the layout analysis section when a layout of the 
document is to be specified. For example, the 
circumscribed rectangles of some coupled images included 
in the document are identified, and whether each 
circumscribed rectangle is a character region, a graphic 
region, or an image region is judged by the size, 
position, and the like of the circumscribed rectangle. 
Then, the corresponding regions are specified to a 
document to be newly created based on these analyzed 
regions by instructions from the layout generation 
instruction section. Accordingly, the newly created 
document has the respective regions set with the same 
layout as that of the model. 

The invention will be further described, by way of 
example, with reference to the accompanying drawings, in 



which : 



Figure 1 is a block diagram showing the configuration of 
a document creating device including a document creation 
assist device constituting an embodiment of the 
invention; 

Figure 2 is a block diagram showing a configuration of a 
layout analysis section; 

Figure 3 is a flow chart showing an example of a 
procedure for region analysis; 

Figure 4 is a flow chart showing an example of a 
procedure for character portion analysis; 

Figure 5 is a flow chart showing an example of a 
procedure for graphic portion analysis; and 

Figures 6A and 6B are diagrams schematically illustrating 
examples of document images and an example of a layout 
analysis. 

Figure 1 shows the configuration of a document creating 
device, such as a word processor including a document 
creation assist system. A document creation assist 
device 1 includes: an image input section 2 which reads a 
document as an image; a layout analysis section 3 which 
defines a document layout based on the images read from 
the document; and a layout generation instruction section 
4 which instructs generation of graphic boxes, character 
boxes, image boxes, or the like in accordance with the 
analyzed layout. The layout specified by the layout 
generation instruction section 4 is represented by a 
layout casting section 5 so that the user can see the 
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layout. Data entered from a keyboard/mouse 6 are 
converted to characters and graphics by a 
character /graphic input section 7, entered into each box 
which has already been laid out by a content casting 
section 8, and displayed by a display section 9. When 
the layout is input manually from the keyboard/mouse 6, a 
layout co-ordinate/attribute input section 10 converts 
the data from the keyboard/mouse 6 and the converted data 
are sent to the layout generation instruction section 4. 

Figure 2 shows a configuration of the layout analysis 
section 3. The images read from the document by the 
image input section 2 are divided into regions such as a 
character region, a vector- represented graphic region, 
and a bit map image region. In the character region, 
data {character box/character portion attribute data) 
such as the position of the character region, the size of 
characters, the distance between character strings, and 
the direction of the character strings are identified by 
a character portion analysis section 12. In the graphic 
region, data such as the outer frame of a graphic, the 
frame of a table, ruled lines and column/row partitioning 
lines of tables are detected and output as vector data 
(graphic box/line data). In the image region, data are 
produced as box data. 

The operation of the layout analysis section 3 will be 
described next. 

The document images serving as a model layout are input 
by the image input section 2 (see Figure 1), and the 
document images represented in binary-coded form are 
analyzed by a region analysis section 11. Although 
region analysis techniques are not particularly limited, 
one example of such a technique identifies circumscribed 



rectangles, each enclosing images which are coupled to 
one another therein, and classifies each rectangle by the 
size, position, and the like. 

Figure 3 shows an example of a procedure for a region 
analysis. 

Circumscribed rectangles are identified (Step S101). Any 
overlapping circumscribed rectangles are integrated into 
a single circumscribed rectangle which incorporates all 
such overlapping circumscribed rectangles (Step S102). 
Those circumscribed rectangles whose vertical or 
horizontal length exceeds the size of a character can be 
identified as a bit map image or a vector-represented 
graphic (Step S103). For example, if characters of up to 
36 point in size are used, any circumscribed rectangle 
whose height or width is larger than about 13 mm is 
judged to be a bit map image or a vector-represented 
graphic. If a circumscribed rectangle, among large 
circumscribed rectangles, satisfies the conditions that 
its height-to-width ratio is close to 1, for example 
between 1/3 and 3, and the percentage of black pixels 
contained therein is relatively high, then it can be 
identified as an image region (Step S104). Other large 
circumscribed rectangles can be deemed as graphic 
regions. Small circumscribed rectangles are judged to be 
characters, and this can be verified by checking the 
periodicity of the characters in their horizontal and 
vertical directions (Step S105). A set of circumscribed 
rectangles whose character periodicity in both the 
horizontal and vertical directions is substantially 
constant is identified as a group of characters having 
the same attribute, i.e. as a character region (Step 
S106). The above processing identifies character 
regions, graphic regions, and image regions, and provides 
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data about the size and position of each region. 

While each region is identified by the size and height- 
to-width ratio of each circumscribed rectangle in the 
above region analysis, other analysis techniques may be 
applicable, for example an algorithm merging both a 
marginal distribution method and a black points 
concatenation method such as disclosed in "Automatic 
Document Recognition System" (Kida et al), or "Image 
Electronics Society Journal" (Vol.15 No. 2, 1986, pp. 107- 
115) . 

The character portion analysis section 12 will be 
described next. The section 12 analyses the pattern of 
regions identified as character regions by the region 
analysis section 11. Analysis techniques are not 
particularly limited with this section 11. One technique 
is to utilise the previously identified circumscribed 
rectangles and their periodicity. Figure 4 outlines an 
example of a procedure thereof. 

The direction of a character string is first detected. 
The distance between characters in a character string is 
usually smaller than that between character strings. 
First, the distances between circumscribed rectangles are 
identified (Step S201), and the direction, either 
horizontal or vertical, in which the average distance 
between the circumscribed rectangles is smaller is judged 
as the direction in which the characters are arranged 
(Step S202). That is, when the distance in the vertical 
direction is larger than that in the horizontal 
direction, it is judged that the characters are written 
horizontally and thus the distance between the character 
strings is identified as having a vertical periodicity 
(Step S203). Otherwise, it is judged that the characters 
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are written vertically and thus the distance between the 
character strings is identified as having a horizontal 
periodicity (Step S204). 

The size of a character is then calculated from the 
maximum values of the height and width of each 
circumscribed rectangle in the region (Step S205). As a 
result of the above processing, the attributes within 
each character region can be identified. 

The graphic portion analysis section 13 analyzes the 
regions judged as graphic regions by the region analysis 
section 11 and represents the line drawings therein in 
terms of vectors. Although it is possible to express all 
the detected line drawings in vectors, only long straight 
line segments, both horizontal and vertical, which can be 
considered as being important and thus repetitively 
usable in terms of a pattern of layout are identified. 
These line segments include: frame data, ruled lines, 
table forming frames consisting of vertical and 
horizontal straight lines, and column/row partitioning 
lines . 

Figure 5 shows an example of a procedure for the graphic 
portion analysis. 

One technique of identifying horizontal and vertical 
lines is such that black pixels are traced substantially 
horizontally and substantially vertically and that those 
exceeding predetermined length are selected (Steps S301, 
S302). The obtained horizontal and vertical lines are 
converted to vector data represented by data such as the 
start point/end point data and the width (Step S303). 
Among these vector- represented lines, any single line 
segment which stays alone or away from the others is 
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judged as a row or a column partitioning line dependent 
upon orientation (Step S304). A portion comprised of a 
combination of a plurality of horizontal and vertical 
lines is judged as part of a table (Step S305), while a 
portion formed by a single rectangle with horizontal and 
vertical lines is deemed to be a graphic box (Step S306). 
Otherwise, the portions are considered as general 
graphics . 

The layout analysis is completed with the above 
processing, and the document images read by the image 
input section 2 are broken down into the character 
region, the image region, and the graphic region. The 
graphic region includes tables and lines. The character 
region additionally includes attributes such as the 
character size, the distance between the character 
strings, and the direction of the character strings. The 
layout generation instruction section 4 shown in Figure 1 
instructs layout casting based on the above data to the 
layout casting section 5, so that the each region can be 
arranged as instructed. 

For example, if a document serving as a model including a 
character portion a, a bit map image portion b, a graphic 
portion c, a partitioning line d, and a table e as shown 
in Figure 6A is read by the image input section 2, then a 
layout analysis on this document generates a layout which 
is schematically shown in Figure 6B, In Figure 6B, 
reference character A designates a character region; B, 
an image region; C, a graphic region; D, a partitioning 
line; and E, a table structure. Since the respective 
regions identified by the layout analysis have been 
displayed on a screen of the display section 9, the user 
then specifies a region into which characters or graphics 
are entered via the keyboard/mouse 6. Thus, the user 
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only has the operation of inputting the desired data into 
the corresponding regions via the keyboard/mouse 6 to 
create a document having the desired layout. 

To insert a bit map image directly into a document, the 
image data input from the image input section 2 may be 
sent to the content casting section 8 for its synthesis 
with the document. 

The user may edit the layout of the respective regions 
generated by the layout analysis as desired, and the 
contents may be laid out in the edited regions. 

While document creation is taken as an example of use of 
the document creation aid, it has other uses such as in 
the creation of pre-f ormatted documents, such as a slip. 

While, for tables, only the vertical and horizontal lines 
are reproduced based on the vector data obtained by the 
analysis in the above embodiment, data that indicate 
attributes such as rows and columns of tables may be 
output instead of vector data if the document creating 
device can generate and manage table structures. 

While the data identified in a character portion only are 
the size of characters , the distance between character 
strings, and the writing style, either vertical or 
horizontal, in the above embodiment, data such as the 
difference in calligraphic style or the difference in 
language, e.g. Japanese or English, may be identified and 
included as additional attributes. 

By arranging for respective characters to be recognised 
in a character region and for respective graphics to be 
recognised in a graphic region, then not only the layout 
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data but also its contents can be re-usable. 

It is thus possible to generate a layout automatically 
based on the images read from a document. Therefore, 
when creating a document having an analogous layout, the 
user does not have to perform region setting and revising 
operations at all, or at least such operations are 
reduced into simpler ones, thereby improving document 
creation efficiency. In addition, it is no longer 
required to prepare or store documents with standard 
layouts in advance, thereby dispensing not only with the 
preparatory labour but also with a large capacity storage 
unit. 
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CLAIMS 

1. A document creation aid, comprising: 

image input means for reading an image of a first 
document serving as a model; 

layout analysis means for analyzing the image of 
the first document and for identifying the layout of the 
first document in respect of the position, size and at 
least one attribute of at least one region included 
within the first document; and 

layout generation instruction means for 
instructing regions to be set in a second document so 
that the layout of the second document is a substantial 
copy of the layout of the first document. 

2. A document creation assist device for use in a 
document creating device which, when a document is 
created, defines regions which are different from one 
another in accordance with attributes of images included 
in said document and processes each of said regions 
differently, said document creation assist device 
comprising: 

an image input section for reading said document 
serving as a model as an image; 

a layout analysis section for judging attributes 
of said images read from said document and identifying 
layout data indicating the size, position, and attribute 
of each of said regions based on differences in said 
judged attributes; and 

a layout generation instruction section for 
instructing to set the same regions as those of said 
document serving as a model to a document to be created 
based on said identified layout data. 

3. A document creation assist device according to 
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Claim 2, wherein said layout analysis section includes: 
a region analysis section for identifying a 

character region, a graphic region, and an image region 

from said images of said document; 

a character portion analysis section for analyzing 

an attribute of a character portion within said character 

region; and 

a graphic portion analysis section for identifying 
a line component within a graphic by analyzing a graphic 
portion within said graphic region and outputting said 
line component as vector data. 

4. A method of formatting a second document from a 
first document, comprising the steps of analyzing an 
image of the first document to determine the relative 
size, relative position and at least one attribute of at 
least one region of the first document, and generating 
instructions for the layout of the second document such 
that the second document has substantially the same 
layout as that of the first document. 

5. A document creation aid substantially as 
hereinbefore described with reference to and as 
illustrated in the accompanying drawings. 

6. A method of formatting a document substantially as 
hereinbefore described with reference to the accompanying 
drawings . 
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